Abstract
We previously demonstrated that T-lineage acute lymphoblastic leukemia (T-ALL) can be classified into 15 molecular subtypes based on whole transcriptome, exome, and genome sequencing (WTS/WGS)1. These subtypes are defined by distinct drivers and co-occurring alterations, correspond to specific T-cell developmental stages and show differences in clinical outcomes. Using integrated multi-omics, we identified putative coding and non-coding driver alterations in over 95% of cases, highlighting the value of genomics-based classification for mechanistic investigation and clinical risk stratification.
However, implementing such a WGS/WTS classification system is challenging. Accurate interpretation of WGS and WTS data requires advanced computational and genomics expertise, particularly for detecting non-coding or cryptic alterations. Technical issues, such as low tumor purity or tumor-in-normal (TIN) contamination poses challenges for variant detection. Furthermore, reliance on a single omics modality may result in ambiguous subtype calls, especially in complex or borderline cases.
To address these challenges and facilitate clinical translation, we developed TALLForest an automated WTS/WGS-based classifier that integrates gene expression, structural variants (SVs), copy number variants (CNVs), and small variants (SNVs/indels) to assign consensus molecular subtypes in T-ALL.
TALLForest classifies samples using a gene expression-based random forest (RF) model trained on 1,145 high-quality samples from patients enrolled in AALL0434 trial. Included samples had ≥70% tumor purity, concordant transcriptomic and genomic subtype annotations, and a minimum of 10 cases per subtype, resulting in 13 subtypes used for training. Subtypes with insufficient representation (NKX2-5, NUP98, NUP214) were excluded, as was the TME-enriched subtype due to low purity and unclear driver events. TALLForest then extracts and annotates genomic variants from WGS, or WTS data, including coding and non-coding regions. Genetic subtype is assigned based on the presence of class-defining alterations. Finally, a consensus classification is derived by integrating gene expression and variant-based calls. We benchmarked TALLForest under WTS-only and WGS+WTS settings using per-subtype confusion matrices and overall accuracy. Robustness was further assessed on samples with TIN contamination and low purity. External validation was performed using 514 T-ALL samples from pediatric and adult cohorts, including both published and unpublished datasets. Samples were included if they had matched WGS and WTS data, or if a subtype-defining driver alteration could be identified from WTS data alone.
On the AALL0434 training dataset, TALLForest achieved 100% classification accuracy with WGS+WTS data. In 96% of cases, the assigned subtype was supported by a class-defining driver alteration. Tumor-only WGS calling based mode performed comparably to tumor-normal paired WGS mode. In 62 samples with <70% leukemia blasts and 47 samples with 10-50% TIN, accuracy remained high at 94%, demonstrating robustness in non-ideal settings. With WTS-only data, classification accuracy remained high (99%), though only 53% of cases had detectable subtype-defining driver alterations, resulting from the reduced sensitivity of WTS in identifying non-coding genomic events.
External validation on 514 T-ALL cases demonstrated 95.5% overall accuracy. Among correctly classified samples, 92% were supported by both gene expression and genomic alterations. In WTS-only samples, TALLForest achieved 94% accuracy, with 39% showing subtype-defining driver alterations.
In summary, TALLForest provides robust and accurate molecular classification of T-ALL using integrated transcriptomic and genomic data. It performs well across diverse sequencing conditions, including low-purity and WTS-only datasets, tumor-only/paired WGS, and enables consistent subtype assignment supported by driver alterations. TALLForest represents a key step toward incorporating molecular classification into translational research and clinical care for T-ALL.
Pölönen P, Di Giacomo D, Seffernick AE, et al. The genomic basis of childhood T-lineage acute lymphoblastic leukaemia. Nature. 2024;632(8027):1082-1091.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal